88 research outputs found

    A methodology for the characterization of business-to-consumer E-commerce.

    Get PDF
    This thesis concerns the field of business-to-consumer electronic commerce. Research on Internet consumer behaviour is still in its infancy, and a quantitative framework to characterize user profiles for e-commerce is not yet established. This study proposes a quantitative framework that uses latent variable analysis to identify the underlying traits of Internet users' opinions. Predictive models are then built to select the factors that are most predictive of the propensity to buy on-line and classify Internet users according to that propensity. This is followed by a segmentation of the online market based on that selection of factors and the deployment of segment-specific graphical models to map the interactions between factors and between these and the propensity to buy online. The novel aspects of this work can be summarised as follows: the definition of a fully quantitative methodology for the segmentation and analysis of large data sets; the description of the latent dimensions underlying consumers' opinions using quantitative methods; the definition of a principled method of marginalisation to the empirical prior, for Bayesian neural networks, to deal with the use of class-unbalanced data sets; a study of the Generative Topographic Mapping (GTM) as a principled method for market segmentation, including some developments of the model, namely: a) an entropy-based measure to compare the class-discriminatory capabilities of maps of equal dimensions; b) a Cumulative Responsibility measure to provide information on the mapping distortion and define data clusters; c) Selective Smoothing as an extended model for the regularization of the GTM training

    Bioinformatics and Medicine in the Era of Deep Learning

    Get PDF
    Many of the current scientific advances in the life sciences have their origin in the intensive use of data for knowledge discovery. In no area this is so clear as in bioinformatics, led by technological breakthroughs in data acquisition technologies. It has been argued that bioinformatics could quickly become the field of research generating the largest data repositories, beating other data-intensive areas such as high-energy physics or astroinformatics. Over the last decade, deep learning has become a disruptive advance in machine learning, giving new live to the long-standing connectionist paradigm in artificial intelligence. Deep learning methods are ideally suited to large-scale data and, therefore, they should be ideally suited to knowledge discovery in bioinformatics and biomedicine at large. In this brief paper, we review key aspects of the application of deep learning in bioinformatics and medicine, drawing from the themes covered by the contributions to an ESANN 2018 special session devoted to this topic

    Influencing factors in energy use of housing blocks: a new methodology, based on clustering and energy simulations, for decision making in energy refurbishment projects

    Get PDF
    In recent years, big efforts have been dedicated to identify which are the factors with highest influence in the energy consumption of residential buildings. These factors include aspects such as weather dependence, user behaviour, socio-economic situation, type of the energy installations and typology of buildings. The high number of factors increases the complexity of analysis and leads to a lack of confidence in the results of the energy simulation analysis. This fact grows when we move one step up and perform global analysis of blocks of buildings. The aim of this study is to report a new methodology for the assessment of the energy performance of large groups of buildings when considering the real use of energy. We combine two clustering methods, Generative Topographic Mapping and k-means, to obtain reference dwellings that can be considered as representative of the different energy patterns and energy systems of the neighbourhood. Then, simulation of energy demand and indoor temperature against the monitored comfort conditions in a short period is performed to obtain end use load disaggregation. This methodology was applied in a district at Terrassa City (Spain), and six reference dwellings were selected. Results showed that the method was able to identify the main patterns and provide occupants with feasible recommendations so that they can make required decisions at neighbourhood level. Moreover, given that the proposed method is based on the comparison with similar buildings, it could motivate building occupants to implement community improvement actions, as well as to modify their behaviour

    Societal issues in machine learning: When learning from data is not enough

    Get PDF
    It has been argued that Artificial Intelligence (AI) is experiencing a fast process of commodification. Such characterization is on the interest of big IT companies, but it correctly reflects the current industrialization of AI. This phenomenon means that AI systems and products are reaching the society at large and, therefore, that societal issues related to the use of AI and Machine Learning (ML) cannot be ignored any longer. Designing ML models from this human-centered perspective means incorporating human-relevant requirements such as safety, fairness, privacy, and interpretability, but also considering broad societal issues such as ethics and legislation. These are essential aspects to foster the acceptance of ML-based technologies, as well as to ensure compliance with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. The ESANN special session for which this tutorial acts as an introduction aims to showcase the state of the art on these increasingly relevant topics among ML theoreticians and practitioners. For this purpose, we welcomed both solid contributions and preliminary relevant results showing the potential, the limitations and the challenges of new ideas, as well as refinements, or hybridizations among the different fields of research, ML and related approaches in facing real-world problems involving societal issues

    Sometimes, Money Does Grow On Trees: Data-Driven Demand Response with DR-Advisor

    Get PDF
    Real-time electricity pricing and demand response has become a clean, reliable and cost-effective way of mitigating peak demand on the electricity grid. We consider the problem of end-user demand response (DR) for large commercial buildings which involves predicting the demand response baseline, evaluating fixed DR strategies and synthesizing DR control actions for load curtailment in return for a financial reward. Using historical data from the building, we build a family of regression trees and learn data-driven models for predicting the power consumption of the building in real-time. We present a method called DR-Advisor called DR-Advisor, which acts as a recommender system for the building\u27s facilities manager and provides suitable control actions to meet the desired load curtailment while maintaining operations and maximizing the economic reward. We evaluate the performance of DR-Advisor for demand response using data from a real office building and a virtual test-bed

    The Coming of Age of Interpretable and Explainable Machine Learning Models

    Get PDF
    Machine learning-based systems are now part of a wide array of real-world applications seamlessly embedded in the social realm. In the wake of this realisation, strict legal regulations for these systems are currently being developed, addressing some of the risks they may pose. This is the coming of age of the interpretability and explainability problems in machine learning-based data analysis, which can no longer be seen just as an academic research problem. In this tutorial, associated to ESANN 2021 special session on “Interpretable Models in Machine Learning and Explainable Artificial Intelligence”, we discuss explainable and interpretable machine learning as post-hoc and ante-hoc strategies to address these problems and highlight several aspects related to them, including their assessment. The contributions accepted for the session are then presented in this contex

    A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases

    Get PDF
    Machine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for helping in a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Meansinitialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests

    Extraction of artefactual MRS patterns from a large database using non-negative matrix factorization

    Get PDF
    Despite the success of automated pattern recognition methods in problems of human brain tumor diagnostic classification, limited attention has been paid to the issue of automated data quality assessment in the field of MRS for neuro-oncology. Beyond some early attempts to address this issue, the current standard in practice is MRS quality control through human (expert-based) assessment. One aspect of automatic quality control is the problem of detecting artefacts in MRS data. Artefacts, whose variety has already been reviewed in some detail and some of which may even escape human quality control, have a negative influence in pattern recognition methods attempting to assist tumor characterization. The automatic detection of MRS artefacts should be beneficial for radiology as it guarantees more reliable tumor characterizations, as well as the development of more robust pattern recognition-based tumor classifiers and more trustable MRS data processing and analysis pipelines. Feature extraction methods have previously been used to help distinguishing between good and bad quality spectra to apply subsequent supervised pattern recognition techniques. In this study, we apply feature extraction differently and use a variant of a method for blind source separation, namely Convex Non-Negative Matrix Factorization, to unveil MRS signal sources in a completely unsupervised way. We hypothesize that, while most sources will correspond to the different tumor patterns, some of them will reflect signal artefacts. The experimental work reported in this paper, analyzing a combined short and long echo time 1H-MRS database of more than 2000 spectra acquired at 1.5T and corresponding to different tumor types and other anomalous masses, provides a first proof of concept that points to the possible validity of this approach

    A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

    Get PDF
    BackgroundThe clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal.Methodology/Principal FindingsNon-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification.Conclusions/SignificanceWe show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing

    Extraction of artefactual MRS patterns from a large database using non-negative matrix factorization

    Get PDF
    Despite the success of automated pattern recognition methods in problems of human brain tumor diagnostic classification, limited attention has been paid to the issue of automated data quality assessment in the field of MRS for neuro-oncology. Beyond some early attempts to address this issue, the current standard in practice is MRS quality control through human (expert-based) assessment. One aspect of automatic quality control is the problem of detecting artefacts in MRS data. Artefacts, whose variety has already been reviewed in some detail and some of which may even escape human quality control, have a negative influence in pattern recognition methods attempting to assist tumor characterization. The automatic detection of MRS artefacts should be beneficial for radiology as it guarantees more reliable tumor characterizations, as well as the development of more robust pattern recognition-based tumor classifiers and more trustable MRS data processing and analysis pipelines. Feature extraction methods have previously been used to help distinguishing between good and bad quality spectra to apply subsequent supervised pattern recognition techniques. In this study, we apply feature extraction differently and use a variant of a method for blind source separation, namely Convex Non-Negative Matrix Factorization, to unveil MRS signal sources in a completely unsupervised way. We hypothesize that, while most sources will correspond to the different tumor patterns, some of them will reflect signal artefacts. The experimental work reported in this paper, analyzing a combined short and long echo time 1H-MRS database of more than 2000 spectra acquired at 1.5T and corresponding to different tumor types and other anomalous masses, provides a first proof of concept that points to the possible validity of this approach.Peer ReviewedPostprint (author's final draft
    corecore